Data Warehouse Design with UML
نویسندگان
چکیده
A data warehouse is a repository of data formed of a collection of data extracted from different and possible heterogeneous sources (e.g., databases or files). One of the main problems in integrating databases into a common repository is the possible inconsistency of the values stored in them, i.e., the very same term may have different values, due to misspelling, a permuted word order, spelling variants and so on. In this paper, we present an automatic method for reducing inconsistency found in existing databases, and thus, improving data quality. All the values that refer to a same term are clustered by measuring their degree of similarity. The clustered values can be assigned to a common value that, in principle, could substitute the original values. Thus, the values are uniformed. The method we propose provides good results with a considerably low error rate. Acceptance rate: More than 200 submissions, 68 papers accepted. AR ' 0.34
منابع مشابه
The use of UML to design agricultural data warehouses
Recent research works propose to use the Unified Modeling Language (UML) to design data warehouses. First, the paper overviews these recent UML-based techniques. We show that UML can help system designers to build a data warehouse model. This type of model aims to describe the different analysis dimensions of the data. Second, we will also present different UML-based tools used during a project...
متن کاملUML for data warehouse dimensional modeling
Dimensional modeling is a common modeling technique in data warehousing. It reflects a simple logical view of a data warehouse system. It can be easily mapped to a physical design. Traditional dimensional modeling is data-oriented and semantically informal. From a software engineering perspective, the informal notations and data-oriented feature are insufficient to tackle the complexity of larg...
متن کاملA Comprehensive Method for Data Warehouse Design
A data warehouse (DW) is a complex information system primarily used in the decision making process by means of On-Line Analytical Processing (OLAP) applications. Although various methods and approaches have been presented for designing di erent parts of DWs, such as the conceptual and logical schemas or the Extraction-TransformationLoading (ETL) processes, no general and standard method exists...
متن کاملAn Approach for Generating an XML Data Warehouse Schema using Model Transformation Language
Traditionally, the multidimensional schema of the data warehouse is derived from data sources that are mainly the company’s internal data, well-known and structured, by identifying facts, dimensions and numeric measurements through a manual analysis of the operational schemas. With the proliferation of new platforms of communication in today’s information societies, there has been growing numbe...
متن کاملUclean: a Requirement Based Object- Oriented Etl Framework
Data warehouse is used to provide effective results from multidimensional data analysis. The accuracy and correctness of these results depend on the quality of the data. To improve data quality, data must be properly extracted, transformed and loaded into the data warehouse. This ETL process is the key to the success of a data warehouse. In this paper we propose a conceptual ETL framework for a...
متن کاملA Data Warehouse Engineering Process
Developing a data warehouse (DW) is a complex, time consuming and prone to fail task. Different DW models and methods have been presented during the last few years. However, none of them addresses the whole development process in an integrated manner. In this paper, we present a DW development method, based on the Unified Modeling Language (UML) and the Unified Process (UP), which addresses the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005